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Amendment to the Claims 



This listing of claims will replace all prior versions, and listings, of claims in the 
application: 

Listing of Claims: 



1 Claims 1-18 (canceled). 

1 1 9, (currently amended) A system according to Claim 18, furth e r for 

2 providing efficient document scoring of concepts within and clustering of 

3 documents in an electronically-stored document set, comprising: 

4 [[the]] a scoring module e valuating the s cor e scoring a docmnent in an 

5 electronically-stored document set, comprising: 

6 a frequency submodule determining a frequency of occurrence of 

7 at least one concept within a document; 

8 a concept weight submodule analyzing a concept weight reflecting 

9 a specificity of meaning for the at least one concept within the document, wherein 

10 the concept weight is based on a number of terms for the at least one concept; 

1 1 a structural weight submodule analyzing a structural weight 

12 reflecting a degree of significance based on structural location within the 

13 document for the at least one concept; 

14 a corpus weight submodule analyzing a corpus weight inversely 

15 weighing a reference count of occurrences for the at least one concept within the 

16 document; 

17 a scoring evaluation submodule evaluating a score to be associated 

18 with the at least one concept as a fimction of a summation of the frequency, 

1 9 concept weight, structural weight, and corpus weight in accordance with the 

20 formula: 

J 

21 ^^fyX cw^j X sw^j X rw^ 
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22 where comprises the score, comprises the frequency, 0 < cwij < 1 comprises 

23 the concept weight, 0 < swij < 1 comprises the structural weight, and 0 < rwy < 1 

24 comprises the corpus weight for occurrence j of concept 

25 a vector submodule forming the score assigned to the at least one 

26 concept as a normalized score vector for each such document in the 

27 electronically-stored document set; and 

28 a determination submodule determining a similarity between the 

29 normalized score vector for each such document as an inner product of each 

30 normalized score vector: 

31 a clustering module grouping the documents by the score into a plurality 

32 of clusters, comprising: 

33 a selection submodule selecting a set of candidate seed documents 

34 from the electronically-stored document set; 

35 a cluster seed submodule identifying seed documents by applying 

36 the similarity to each such candidate seed document and selecting those candidate 

37 seed documents that are sufficiently unique from other candidate seed documents 

38 as the seed documents; 

39 an identification submodule identifying a plurality of non-seed 

40 documents; 

41 a comparison submodule determining the similarity between each 

42 non-seed document and a cluster center of each cluster; and 

43 a clustering submodule assigning each such non-seed document to 

44 the cluster with a best fit, subject to a minimum fit; and 

45 a threshold module relocating outlier documents, comprising determining 

46 the similarity between each of the documents grouped into each cluster based on 

47 the center of the cluster and the scores assigned to each of the at least one 

48 concepts in that document, dynamically determining a threshold for each cluster 

49 as a function of the similarity between each of the documents, and identifying and 

50 reassigning each of the documents with the similarity falling outside the 

51 threshold . 
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20. (previously presented) A system according to Claim 19, further 
comprising: 

the concept weight module evaluating the concept weight in accordance 
with the formula: 

0.25 + (0.25x/J l^^y ^3 

0.25, 



t >7 

where cwy comprises the concept weight and tij comprises the number of terms for 
occurrence / of each such concept /. 

21 . (previously presented) A system according to Claim 19, further 
comprising: 

the structural weight module evaluating the structural weight in 



accordance with the 


formula: 




'1.0, 


if{j - 




0.8, 


if{j « 


sw^i =■ 


0.7, 


if{j - 




0.5 


if{j - 




0.1 


//0» 



summary) 
body) 



where swtj comprises the structural weight for occurrence j of each such concept /. 



22. (previously presented) A system according to Claim 19, further 
comprising: 

the corpus weight module evaluating the corpus weight in accordance with 
the formula: 

2 



f.1 



J 



1.0, 



r,^>M 



r,^<M 



where rwij comprises the corpus weight, Vij comprises a reference count for 
occurrence j of each such concept /, T comprises a total number of reference 



Final OA Resp 3 



-4- 



Response to Final Office Action 
Docket No. 013.0207.US.UTL 



8 counts of documents in the document set, and M comprises a maximum reference 

9 count of documents in the document set. 

1 23. (previously presented) A system according to Claim 19, further 

2 comprising: 

3 a compression module compressing the score in accordance with the 

4 formula: 

5 5; = log(5^+l) 

6 where S] comprises the compressed score for each such concept /. 

1 24. (currently amended) A system according to Claim 1 8 Claim 19, 

2 further comprising: 

3 a global stop concept vector cache maintaining concepts and terms; and 

4 a filtering module filtering selection of the at least one concept based on 

5 the concepts and terms maintained in the global stop concept vector cache. 

1 25. (currently amended) A system according to Claim 18 Claim 19 , 

2 further comprising: 

3 a parsing module identifying terms within at least one document in the 

4 docximent set, and combining the identified terms into one or more of the 

5 concepts. 

1 26. (original) A system according to Claim 25, further comprising: 

2 the parsing module structuring each such identified term in the one or 

3 more concepts into canonical concepts comprising at least one of word root, 

4 character case, and word ordering. 

1 27, (original) A system according to Claim 25, wherein at least one of 

2 nouns, proper nouns and adjectives are included as terms. 

1 Claims 28-30 (canceled). 
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1 31. (currently amended) A system according to Claim 1 8 Claim 19 , 

2 further comprising: 

3 the similarity submodule calculating the similarity in accordance with the 

4 formula: 

5 coscr^^ = ' , 

6 where coso-^^ comprises a similarity between a document A and a document B, 

1 S ^ comprises a score vector for document and comprises a score vector for 

8 document B. 

1 Claims 32-35 (canceled). 

1 36. (currently amended) A method according to Claim 35, furth e r for 

2 providing efficient document scoring of concepts within and clustering of 

3 documents in an electronicallv-stored document set, comprising: 

4 evaluating the score scoring a document in an electronically-stored 

5 document set, comprising: 

6 determining a frequency of occurrence of at least one concept 

7 within a document; 

8 analyzing a concept weight reflecting a specificity of meaning for 

9 the at least one concept within the document, wherein the concept weight is based 

10 on a number of terms for the at least one concept; 

1 1 analyzing a structural weight reflecting a degree of significance 

12 based on structural location within the document for the at least one concept; 

13 analyzing a corpus weight inversely weighing a reference count of 

14 occurrences for the at least one concept within the document; and 

15 evaluating a score to be associated vsdth the at least one concept as 

16 a function of a summation of the frequency, concept weight, structural weight, 

1 7 and corpus weight and in accordance with the formula: 
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J 

19 where Sj comprises the score,/^ comprises the frequency, 0 < cwy < 1 comprises 

20 the concept weight, 0 < swy < 1 comprises the structural weight, and 0 < rwy < 1 

21 comprises the corpus weight for occurrence j of concept 

22 forming the score assigned to the at least one concept as a normalized 

23 score vector for each such document in the electronically-stored document set; 

24 determining a similarity between the normalized score vector for each 

25 such document as an inner product of each normalized score vector: 

26 grouping the documents by the score into a plurality of clusters, 

27 comprising: 

28 selecting a set of candidate seed documents from the 

29 electronically-stored document set; 

30 identifying seed documents by applying the similarity to each such 

31 candidate seed document and selecting those candidate seed documents that are 

32 sufficiently unique from other candidate seed documents as the seed documents; 

33 identifying a plurality of non-seed documents; 

34 determining the similarity between each non-seed document and a 

35 center of each cluster; and 

36 assigning each non-seed document to the cluster with a best fit, 

37 subject to a minimum fit; and 

38 relocating outlier documents, comprising: 

39 determining the similarity between each of the documents grouped 

40 into each cluster based on the center of the cluster and the scores assigned to each 

41 of the at least one concepts in that document; 

42 dynamically determining a threshold for each cluster as a function 

43 of the similarity between each of the documents: and 

44 identifying and reassigning each of the documents with the 

45 similarity falling outside the threshold . 
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1 37. (previously presented) A method according to Claim 36, farther 

2 comprising: 

3 evaluating the concept weight in accordance with the formula: 

0.25-f (0.25xrJ 1</,^ <3 

4 cw^^ =< 0.25 + (o.25x[7-/. ])t 4<r,^ <6 

0.25, t,j > 1 

5 where cwy comprises the concept weight and tij comprises the number of terms for 

6 occurrence j of each such concept /. 



1 

2 
3 



38. (previously presented) A method according to Claim 36, further 
comprising: 

evaluating the structural weight in accordance with the formula: 



1.0, 


if{j 


« SUBJECT) 


0.8, 


if{j 


« HEADING) 


0.7, 


ifO 


« summary) 


0.5 


ifU 


« BODY) 


0.1 


if{j 


signature) 



5 where swy comprises the structural weight for occurrence j of each such concept /. 



1 39. (previously presented) A method according to Claim 36, further 

2 comprising: 

3 evaluating the corpus weight in accordance with the formula: 



(T- 



1.0, 



r, < M 



5 where rwij comprises the corpus weight, Vij comprises a reference count for 

6 occurrence j of each such concept T comprises a total number of reference 

7 covints of documents in the document set, and M comprises a maximum reference 

8 count of documents in the document set. 
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1 40. (previously presented) A method according to Claim 36, further 

2 comprising: 

3 compressing the score in accordance with the formula: 

4 S; = iog(5,+l) 

5 where SI comprises the compressed score for each such concept z. 

1 41 . (currently amended) A method according to Claim 35 Claim 36 . 

2 further comprising: 

3 maintaining concepts and terms in a global stop concept vector cache; and 

4 filtering selection of the at least one concept based on the concepts and 

5 terms maintained in the global stop concept vector cache. 

1 42. (currently amended) A method according to Claim 35 Claim 36, 

2 further comprising: 

3 identifying terms within at least one document in the document set; and 

4 combining the identified terms into one or more of the concepts. 

1 43. (original) A method according to Claim 42, further comprising: 

2 structuring each such identified term in the one or more concepts into 

3 canonical concepts comprising at least one of word root, character case, and word 

4 ordering. 

1 44. (original) A method according to Claim 42, further comprising: 

2 including as terms at least one of nouns, proper nouns and adjectives. 

1 Claims 45-47 (canceled). 

1 48. (currently amended) A method according to Claim 35 Claim 36 , 

2 further comprising: 

3 calculating the similarity in accordance with the formula: 
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cosa^B = 













\Ss\ 



5 where coscr^^ comprises a similarity between a document A and a document B, 

6 comprises a score vector for document A, and comprises a score vector for 

7 document B, 

1 Claims 49-5 1 (canceled). 

1 52, (currently amended) A computer-readable storage medium holding 

2 code for providing efficient document scoring of concepts within and clustering 

3 of documents in an electronically-stored document set, comprising: 

4 code for scoring a document in an electronically-stored document set, 

5 comprising: 

6 code for determining a frequency of occurrence of at least one 

7 concept within a document; 

8 code for analyzing a concept weight reflecting a specificity of 

9 meaning for the at leeist one concept within the document, wherein the concept 

1 0 weight is based on a number of terms for the at least one concept; 

1 1 code for analyzing a structural weight reflecting a degree of 

1 2 significance based on structural location within the document for the at least one 

1 3 concept; 

14 code for analyzing a corpus weight inversely weighing a reference 

1 5 count of occurrences for the at least one concept within the document; and 

1 6 code for evaluating a score to be associated with the at least one 

1 7 concept as a function of a summation of the frequency, concept weight, structural 

1 8 weight, and corpus weight in accordance with the formula: 
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20 where & comprises the score, 4 comprises the frequency, 0 < cwn < 1 comprises 

21 the concept weight, 0 < sw i i < 1 comprises the structural weight, and 0 < rwi^ < 1 

22 comprises the corpus weight for occurrence / of concept /' : 

23 code for forming the score assigned to the at least one concept as a 

24 normalized score vector for each such document in the electronically-stored 

25 document set; 

26 code for determining a similarity between the normalized score vector for 

27 each such document as an inner product of each normalized score vector; 

28 code for grouping the documents by the score into a plurality of clusters, 

29 comprising; 

30 code for selecting a set of candidate seed documents from the 

3 1 electronically-stored document set; 

32 code for identifying seed documents by applying the similarity to 

33 each such candidate seed document and selecting those candidate seed documents 

34 that are sufficiently unique from other candidate seed documents as the seed 

35 documents; 

36 code for identifying a plurality of non-seed documents; 

37 code for determining the similarity between each non-seed 

38 document and a center of each cluster; and 

39 code for assigning each non-seed document to the cluster with a 

40 best fit, subject to a minimum fit; and 

41 code for relocating outlier documents, comprising: 

42 code for determining the similarity between each of the documents 

43 grouped into each cluster based on the center of the cluster and the scores 

44 assigned to each of the at least one concepts in that document; 

45 code for dynamically determining a threshold for each cluster as a 

46 function of the similarity between each of the documents; and 

47 code for identifying and reassigning each of the documents with 

48 the similarity falling outside the threshold. 
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1 53. (currently amended) An apparatus for providing efficient 

2 document scoring of concepts within and clustering of documents in an 

3 electronically-stored document set, comprising: 

4 means for scoring a document in an electronically- stored document set, 

5 comprising: 

6 means for determining a frequency of occurrence of at least one 

7 concept within a document; 

8 means for analyzing a concept weight reflecting a specificity of 

9 meaning for the at least one concept within the document, wherein the concept 

10 weight is based on a number of terms for the at least one concept; 

1 1 means for analyzing a structural weight reflecting a degree of 

12 significance based on structural location within the document for the at least one 

1 3 concept; 

14 means for analyzing a corpus weight inversely weighing a 

1 5 reference count of occurrences for the at least one concept within the document; 

16 and 

17 means for evaluating a score to be associated with the at least one 

1 8 concept as a function of a summation of the frequency, concept weight, structural 

1 9 weight, and corpus weight in accordance with the formula: 

J 

20 = Z ^^ij ^ ^^ii ^ ^ii 

21 where Si comprises the score, comprises the frequency, 0 < cW i j < 1 comprises 

22 the concept weight, 0 < swij < 1 comprises the structural weight, and 0 < rwii^ < 1 

23 comprises the corpus weight for occurrence /' of concept / ; 

24 means for forming the score assigned to the at least one concept as a 

25 normalized score vector for each such document in the electronically-stored 

26 document set; 

27 means for determining a similarity between the normalized score vector 

28 for each such document as an inner product of each normalized score vector; 
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29 means for grouping the documents by the score into a plurality of clusters, 

30 comprising: 

3 1 means for selecting a set of candidate seed documents from the 

32 electronically-stored document set; 

33 means for identifying seed documents by applying the similarity to 

34 each such candidate seed document and selecting those candidate seed documents 

35 that are sufficiently unique from other candidate seed documents as the seed 

36 documents; 

37 means for identifying a plurality of non-seed documents; 

38 means for determining the similarity between each non-seed 

39 document and a center of each cluster; and 

40 means for assigning each non-seed document to the cluster with a 

41 best fit, subject to a minimum fit; and 

42 means for relocating outlier documents, comprising: 

43 means for determining the similarity between each of the 

44 documents grouped into each cluster based on the center of the cluster and the 

45 scores assigned to each of the at least one concepts in that document; 

46 means for dynamically determining a threshold for each cluster as 

47 a function of the similarity between each of the documents; and 

48 means for identifying and reassigning each of the documents with 

49 the similarity falling outside the threshold. 
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